首页> 外文OA文献 >Feature Selection Methods Based On Mutual Information For Classifying Heterogeneous Features
【2h】

Feature Selection Methods Based On Mutual Information For Classifying Heterogeneous Features

机译:基于互信息的特征选择方法

摘要

Datasets with heterogeneous features can affect feature selection results that are not appropriate because it is difficult to evaluate heterogeneous features concurrently. Feature transformation (FT) is another way to handle heterogeneous features subset selection. The results of transformation from non-numerical into numerical features may produce redundancy to the original numerical features. In this paper, we propose a method to select feature subset based on mutual information (MI) for classifying heterogeneous features. We use unsupervised feature transformation (UFT) methods and joint mutual information maximation (JMIM) methods. UFT methods is used to transform non-numerical features into numerical features. JMIM methods is used to select feature subset with a consideration of the class label. The transformed and the original features are combined entirely, then determine features subset by using JMIM methods, and classify them using support vector machine (SVM) algorithm. The classification accuracy are measured for any number of selected feature subset and compared between UFT-JMIM methods and Dummy-JMIM methods. The average classification accuracy for all experiments in this study that can be achieved by UFT-JMIM methods is about 84.47% and Dummy-JMIM methods is about 84.24%. This result shows that UFT-JMIM methods can minimize information loss between transformed and original features, and select feature subset to avoid redundant and irrelevant features.
机译:具有异构特征的数据集可能会影响不合适的特征选择结果,因为很难同时评估异构特征。特征转换(FT)是处理异构特征子集选择的另一种方法。从非数字特征转换为数字特征的结果可能会产生对原始数字特征的冗余。在本文中,我们提出了一种基于互信息(MI)选择特征子集的方法来对异构特征进行分类。我们使用无监督特征变换(UFT)方法和联合互信息最大化(JMIM)方法。 UFT方法用于将非数字特征转换为数字特征。 JMIM方法用于在考虑类标签的情况下选择要素子集。将转换后的特征与原始特征完全结合,然后使用JMIM方法确定特征子集,并使用支持向量机(SVM)算法对其进行分类。测量任意数量的选定特征子集的分类精度,并在UFT-JMIM方法和Dummy-JMIM方法之间进行比较。通过UFT-JMIM方法可以达到的这项研究中所有实验的平均分类准确度约为84.47%,而Dummy-JMIM方法约为84.24%。该结果表明,UFT-JMIM方法可以最大程度地减少转换后的特征与原始特征之间的信息丢失,并选择特征子集以避免冗余和不相关的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号